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Abstract 



E. Thorp introduced the following card shuffling model. Suppose the number of cards n is even. 
Cut the deck into two equal piles. Drop the first card from the left pile or from the right pile 
according to the outcome of a fair coin flip. Then drop from the other pile. Continue this way 
until both piles are empty. We show that if n is a power of 2 then the mixing time of the Thorp 
shuffle is (9(log 3 n). Previously, the best known bound was 0(log 4 n). 
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1 Introduction 

Card shuffling has a rich history in mathematics, dating back to work of Markov [J] and Poincare 
A basic problem is to determine the mixing time, i.e., the number of shuffles necessary to 
mix up the deck (sec Section [3] for a precise definition) . In [6] , the author found a general method 
that reduces bounding the mixing time of a card shuffle to verifying a local condition that involves 
only pairs of cards. This was used to give mixing time bounds for the Thorp shuffle and Durrett's 
L-reversal chain. In the present paper, we build on the techniques of [6] and get an improved 
analysis of the Thorp shuffle. 

2 Previous work 



Thorp [13J introduced the following card shuffling model in 1973. Assume that the number of cards, 
n, is even. Cut the deck into two equal piles. Drop the first card from the left pile or the right pile 
according to the outcome of a fair coin flip; then drop from the other pile. Continue this way, with 
independent coin flips deciding whether to drop left-right or right-left each time, until both 
piles are empty. 

Analyzing the Thorp shuffle is an old problem with theoretical roots. However, recently the 
Thorp shuffle has found applications in applied cryptography. The author, Phil Rogaway and 
Till Stegers have used the Thorp shuffle as the basis for a practical algorithm for encoding small 
messages such as social security numbers and credit card numbers (see [9]). In order to analyze the 
algorithm it is important to have good bounds on the mixing time. 

The Thorp shuffle, despite its simple description, has been hard to analyze. Determining its 
mixing time has been called the "longest-standing open card shuffling problem" |3j. In [10] the 
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author obtained the first poly log upper bound, proving a bound of 0(log n), valid when n is a 
power of 2. Montenegro and Tetali [8] built on this to get a bound of 0(log 29 n). In [6] the bound 
was improved to O (log 4 re), with no power-of-two assumption. In the present paper we show that 
if the number of cards is a power of two, then the mixing time is 0(log 3 re). 

3 Background 

In this section we give some basic definitions and recall some notation from [6j. Let p(x,y) be 
transition probabilities for a Markov chain on a finite state space V with a uniform stationary 
distribution. For probability measures \x and v on V, define the total variation distance \\fj, — u\\ = 
J2xev \t J, ( x ) ~~ u i x )\i an d define the mixing time 

T mix = min{re : \\p n {x } -)-U\\<\ for all x G V} , (1) 

where IA denotes the uniform distribution. 

For a probability distribution {pi : i € V}, define the (relative) entropy of p by ENT(p) = 
YlievPi ^°&(\V\Pi)i w here we define OlogO = 0. The following well-known inequality links relative 
entropy to total variation distance. We have 

||p-W||<^ENT(p). (2) 

If X is a random variable (or random permutation) taking finitely many values, define ENT(X) 
as the relative entropy of the distribution of X. Note that if P(X = i) = Pi for i £ V then 
ENT(X) = E(log(|y|px))- We shall think of the distribution of a random permutation in S n as a 
sequence of probabilities of length nl, indexed by permutations in S n . If T is a sigma-field, then we 
shall write ENT(X | T) for the relative entropy of the conditional distribution of X given T. Note 
that ENT(X | T) is a random variable. If tt is a random permutation in S n , then for 1 < k < n, 
define J-k = <r(7r _1 (fc), . . . ,7r~ 1 (n)), and define ENT(7r, k) = ENT(7r _1 (A:) | J-k+i) (where we think 
of the conditional distribution of TT^ 1 (k) given Tk+i as being a sequence of length k). The standard 
entropy chain rule (see, e.g., [2]) gives the following proposition. 

Proposition 1 For any i < n we have 

n 

ENT(tt) = e(eNT(vt I Ti)) + E(ENT(vr, k)). 

k=i 

To compute the relative entropy in first term on the right hand side, we think of the distribution 
of 7r given Ti as a sequence of probabilities of length (i — 1)1. 

Remark: Substituting i = 1 into the formula gives ENT(vr) = J2k=i E(ENT(vr, k)). □ 

If we think of tt as representing the order of a deck of cards, with 7r(i) = location of card i, 
then this allows us to think of E(ENT(-7r, k)) as the portion of the overall entropy ENT(-7r) that is 
attributable to the location k. We will also need the following proposition. 

Proposition 2 Let v\ and V2 be random permutations on {0, . . . ,re — 1}. Suppose that there is a 
set W C {0, 1, . . . ,n - 1} such that ^(x) = v^ip) f or al1 x G w - Let T = a(v^ l (x) : x G W). 
Then 

ENT(^i) - ENT(z/ 2 ) = E(ENT(i/i | J 7 ) - ENT(i/ 2 | J 7 )). 
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Proof: By the chain rule for entropy, for i = 1, 2 we can write 

ENT(fj) = ENT(i/r 1 ( x ) : x E W) + E(ENT(^ | F) 
Since the first term doesn't depend on i the proposition follows. □ 

Definition 3 For p,q>0, define d(p, q) = ^plogp + ^qlogq — log^ 2 ^ 2 ) • 

We will need the following proposition, which is easily verified using calculus. 
Proposition 4 (JB$) Fix p > 0. The function d(p, •) is convex. 

Observe that d(p, q) > 0, with equality iff p = q by the strict convexity of the function x — > 
x\ogx. If p = {pi : i £ V} and q = {qi : i £ V} are both probability distributions on V, then we 
can define the "distance" d(p,q) between p and q, by d(p,q) = J2i<=v d(pi,qi). (We use the term 
distance loosely and don't claim that d(-, •) satisfies the triangle inequality.) Note that d(p,q) is 
the difference between the average of the entropies of p and q and the entropy of the average (i.e. 
an even mixture) of p and q. 

We will use the following projection lemma. 

Lemma 5 Let X and Y be random variables with distributions p and q, respectively. Fix a 

function g and let P and Q be the distributions of g(X) and g(Y), respectively. Then d(p,q) > 
d(P,Q). 

Let U denote the uniform distribution on V . Note that if /x is an arbitrary distribution on V, then 
ENT(/z) and d(fi,U) are both notions of a distance from \x to IA. The following lemma relates the 
two. 

Lemma 6 (fd^) For any distribution /i on V we have 

dfaU) > r -^— ENT( M ), 
log | V | 

for a universal constant c > 0. 

A card shuffle can be described as a random permutation chosen from a certain probability 
distribution. If we start with the identity permutation and each shuffle has the distribution of tt, 
then after t steps the cards are distributed like tt\ ■ ■ ■ irt, where the 7Tj are i.i.d. copies of n. 

4 Thorp shuffle 

Recall that the Thorp shuffle has the following description. Assume that the number of cards, n, 
is even. Cut the deck into two equal piles. Drop the first card from the left pile or the right pile 
according to the outcome of a fair coin flip; then drop from the other pile. Continue this way, with 
independent coin flips deciding whether to drop left-right or right-left each time, until both 
piles are empty. 

We will actually work with the time reversal of the Thorp shuffle, which has the same mixing 
time (since the Thorp shuffle is a random walk on a group; see [12] ). For convenience, we assume 
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that n = 2 d is a power of two. By writing the position of each card, from the bottom card (0) to 
the top card (2 d — 1), in binary, we can view the positions as elements of the ci-dimensional unit 
hypercube {0, l} d . The reverse Thorp (RT) shuffle can then be constructed in the following way 
(see, e.g., [9]). Let Z = [z(l,t) : I G {0,l} d-1 ,i G {0,1,...}} be a collection of i.i.d., Bernoulli(l/2) 

random variables. Note that x G {0, l} d can be written as x = (L(x), R(x)), where L(x) and R(x) 
are the leftmost d — 1 and rightmost bit, respectively, of x. The transition rule for the RT shuffle 
is as follows. At time t, suppose that the current state Xt = 7r. Then the new state Xt+i = v o w, 
where v is the permutation that sends 

(L,R) -> (R © Z(L, t),L). 

We are now ready to state the technical result of this paper. 

Lemma 7 Let Xt be the reverse Thorp shuffle with 2 d cards. There is a universal constant c such 
that if [i is a random permutation which is independent of {Xt} then 

ENT(X d o fj) < (1 - c /d)ENT(/i). 

Before proving this lemma we show how it gives the desired mixing time bound. 

Theorem 8 The mixing time of the reverse Thorp shuffle with 2 d cards is 0(d 3 ). 

Proof: Repeated applications of Lemma [7] give 

ENT(X kd ) < (1 - c/d) fc ENT(id) 
< e - ck/d d2 d . 

Now let a be large enough so that fpr) < 1/8 for all d. Then if k = \ad 2 /c\ we have 



ENT(X kd ) < e' ck/d d2 



d < I 



and hence \\X k d — U\\ < i by equation [2j The theorem follows since k is 0(d 2 ). □ 
We now give the proof of lemma [71 

Proof of Lemma [7J Fix an integer T > 1. For integers j < n, define Tj = \}og 2 j\ +^ — T. Note 
that To < T\ < ■ ■ ■ T n —i. Let Z be obtained from Z by flipping the value of Z(L(XT-),Tj) for all 
j. More precisely, define 



Z(l,t) 



1 — Z(l,t) if for some j we have L(Xt(j)) = I and Tj = t; 
Z(l,t) otherwise. 



Let {Xt : t > 0} be the reverse Thorp shuffle process defined by using Z instead of Z. For j with 
< j < n, define Tj(X) = (Xi(j), . . . X d (j)), with a similar definition for Tj(X). For k with 
< k < n, define 

T k = a(T j (X),T j (X):j>k) 

= a(X t (j),X t (j) :j>k,0<t<d) 
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Since T n is trivial and Xj is .^-measurable, we have 

ENT(X rf o/i)-ENT(/i) = ENT(X d o /j, \ F n ) - ENT(X d o /x | T ) (3) 

n-l 

= ^ENT(X d o M |^ +1 )-ENT(X d o M |^) (4) 

We claim that for all j with < j < n we have 

E(ENT(X d o p I ? j+x ) - ENT(X d o fi | ^)) < -ENT(^i)^, (5) 

where c > is a universal constant. Note that combining this with equation §S§ gives 

n-l 

ENT(X d o /i) - ENT(/i) < - Y, < ENT(/x, j) = ^ENT(^), (6) 

which proves the lemma. It remains to verify equation ([5]). 

For j with < j < n, define Tj = a(Tk^.i, Tj(X)}). Note that this is the sigma field 

generated by Tk+i an d the unordered set {Fj(X),Tj(X)}. Note that Tj D -^j+i- Hence for all j 
with < j < n we have 

E(ENT(X d o | < E(ENT(X d o fi \ Tj)), (7) 

by Jensen's inquality applied to x — > x logx. Let = . . . , X^(n— 1)} and let Gj+i denote 

the sigma-field generated by (Xj o for x G W. Let G'j + i = a(fi~ 1 (j + 1), . . . ,/x _1 (n — 1)). 

Then 

ENT(X d o//|^) -ENT(X d o/i|^) = E(ENT(X d o/x|^,g i+1 ) (8) 

-ENT(X d o M |^,^ +1 )) (9) 
= E(ENT(X d o M |.?),^ +1 ) (10) 
-ENT(X d o^|^,^. +1 )), (11) 

where the first equality holds by Proposition Note that Tj+i = a(S,Z(l,t) : (l,t) G S), where 
S = {(r,t) : L(X t (i)) = I or L(X t (i)) = I for some i > j}, 

that is, S is the collection of bits used to generate and T{(X) for i > j. 

We shall refer to indices i with < i < n as cards. Say that cards i and j are adjacent at time 
t if L(X t {i)) = L(Xt(j)). If 7) > 0, let m(J) be the card adjacent to j at time Tj. 

Note that .7-} = a(Tj+i, Z(L(Xt j (j)), Tj)). Therefore, on the event that (L(Xy. (j)), Tj) G S 1 
the expression on the lefthand-side of © is 0. However, we now show that if m(J) < j, then 
(L{X T] (j)),T 3 )iS. 

Note that if (l,t) G S, then either L(Xt(i)) = I for some i > j, or t > Tj for some i > j 
(and hence i > Tj). Thus if m(j) < j, then (L(XT 3 (j)),Tj) £ S. So on the event that m(j) < j 
and {Tj(X),Tj(X)} = {T,T'}, the conditional distribution of (Tj(X),Tj(X)) given Tj is an even 
mixture of (T, V) and (r',r), according to the value of Z(L(XTAj),Tj). 
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Let C(W | J-) denote the conditional distribution of random variable (or random permutation) 
W given the sigma field T . Note that 

C(X d ofi\ Tj,Gj+i) = \C{X d o n | Fj,Q j+x ) + \C{X o n I FjiQj+x). 

Therefore, 

ENT(X d0fl \T j+1 ,g j+1 ) 
< ±ENT(X d o n | F j} g j+1 ) + ±ENT(X o // 1 T h g j+1 ) - d(C{X d o ^ | JF,), £(X o » \ T h g j+1 )) 
= ENT(X d o ^ | Fj,g j+1 ) - d(C(X d o fj, \ Fj,gj +1 ),£(X o /i | .Fj, 

But by the projection lemma, 

d(c(x d o M | jr,-, g j+1 ), c(x d o /i | J^ j ,g j+1 )) > d(c((x d o /i)- 1 ^^)) | F jt g j+ i), c({x d o i rj,g j+ i] 

= d{C( P i-\3)\r ] ^ +l )X^-\m{3))\^,Q l 3+ i))- 

Since \x is independent of JQ, this last quantity is d{£{^ l {j) \ g'j +1 ), £(fj,~ 1 (m(j) \ g'j + ±)- Combin- 
ing this with equation ([5D gives 

E(ENT(X d o m I T 3 ) - ENT(X, o (i | < -E(d(£(^\j) I G' j+1 ), C^ 1 {m(j)) \ g' j+1 j) . (12) 

Let jd—ijd—2 ' ' ' Jo be the binary representation of j. For cards k and j, write D(k,j) = max{i : fe, 7^ 
jj}. Note that D(k,j) is the minimum value oft such that there is positive probability that k and j 
are adjacent after t steps. For t > 0, let B(j,t) = {k : D(k,j) = t}. For convenience, let B(j,t) = 

if f < 0. Let I = {0, 1, . . . , j - 1}. Note that if k € BfoTj) D I, then P (771(7) = *0 = (§) 
Equation (|12p implies that 

E(ENT(X d o /i I j?) - ENT(X d o M | ^)) 

< -E p (^O0 = *)E(d(£(M- 1 O0l^ + i)^(M-VCj))l^ + i)) 

k£B(j,Tj)nI ^ ' 

= - E (|) r+1 T E(d(£( M - 1 (i)|^ +1 ),AM- 1 (m(i))|^ +1 )), 

where r = |~log 2 . It follows that if T is a random variable and 1} = r + 1 — T, then 
E(ENT(X d o n I ^) - ENT(X d o \i \ F,)) 

< " E ( E (^^^^A^Wi^+O^^Mi)!^!))- 

keB(j,Tj)m v 7 

In particular, if T is geometric(l/2), we have 

E(ENT(JQ o fx I - ENT(X d o \i \ Tj)) 
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keB(j,r+i-t)nl 



£ d(c^-Hj) | g' j+1 ),c^-\ m {j) | g' j+1 )). 

Since j > 2 r_1 , this is at most 

-l( 7 E W'O') I £(M _1 (m(j) I S&+i)) 

J fc<j 

< -^(/^(j) I g> j+l ), ^Cifi-HmU) I fl&+i)) 

5 fee/ 

< — cENT(/i, j), 

for a universal constant c, where the first inequality follows from Proposition 2] and the second 
inequality follows from Proposition [6] (since the second argument of d is the uniform distribution) . 
Combining this with equation ([7j) verifies equation ©, which completes the proof. □ 

The above analyis extends to the non power-of-two case and we intend to handle this in the 
final version of this paper. 
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